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Calibration curves are an important part of many measurement processes. The user of a fitted calibration 
curve must know its precision and accuracy. These are determined in a timely fashion using the data iteratively. 
This paper gives a method that divides the data into training and test groups. The test group is iteratively 
checked to see that a prechosen nominal confidence interval probability of coverage is met. If on the basis of this 
check the calibration experiment is completed, the nominal probability level is shown to still be valid. 
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1. Introduction 

Calibration curves are an important part of many 
measurement processes. The user of a fitted calibration 
curve must know its precision and accuracy [l] 1 , and 
these are determined in a timely fashion by using the 
data iteratively. This paper gives a method that divides 
the data into training (calibration curve-producing) and 
test (check) groups. The test group is iteratively 
checked to see that a prechosen nominal confidence 
interval probability of coverage is met. If on the basis of 
this check the calibration experiment is completed, the 
nominal probability level is shown to still be valid. 

We assume that the measurement process has negli- 
gible drift. This is only partially checked by the iterative 
calibration technique; of course, routine application of 
control chart procedures is a must [2]. 

It is also assumed that many measurements are taken 
between calibrations. Under this circumstance particu- 
larly appropriate statistical calibration procedures are 
found in Scheffe [3], Lieberman, Miller, and Hamilton 
[4], and Knafl, Sacks, Spiegelman, and Ylvisaker [5]. 
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We concentrate on the Scheffe procedure; it is demon- 
strated on an engineering example in Lechner, Reeve, 
and Spiegelman [6]. 

All of these procedures produce interval estimates 
such that the true value is contained in (1— a)% of 
them in the long run with probability 1—8. The two 
probability levels a and 8 are chosen by the calibrator. 

In order to describe the iterative procedure, the nec- 
essary notation is given. 

2. Notation and Method 

There are two fundamental variables: Y which is a 
nonstandard measurement of a property andx which is 
an exact standard or certified value of a possibly differ- 
ent property. For the example in section 3, x represents 
the gravimetric value (mass) of liquid in a tank (fig. 1) 
and Y represents differential pressure. 

These two variables are related by the equation 
Y=.tf# + cre, where the terms of the equation are de- 
fined below. The other observables are T ; i = 1,2,..., 
and they correspond to unknown x*. 

Here Y is an nxl vector of observations, H is an nxp 
full rank matrix whose /-th row is 
h^h^0-(*i(*/)v. A(*/))> #=(&,. ..,/? p ), e is an nxl 
vector of independent and identically distributed stan- 
dard normal random variables having mean zero and 
covariance matrix V(e)~l n9 and or is the standard devi- 
ation. The T t are post calibration observations and the 
goal is to estimate their associated x-; in this paper x* are 
taken to be unknown constants. 
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Figure 1™ Calibrated tank located at NBS. A cubic model was used 
to correspond to linear deformation of all the tank walls. 

The calibration curve is denoted by m(x)-V(x)fi; it 
is taken to be monotonia Let the least squares estimate 
of m (x) be denoted by m{x) and its variance by cr 2 s\x). 
Initially we assume that cr 2 is known. We discuss esti- 
mating a 2 in section 4. 

Data are nearly always collected sequentially; there- 
fore it makes sense to analyze them sequentially. Once 
the measurement process is out of control additional 
measurements are of value only in identifying the prob- 
lem. If a reasonable statistical procedure is available for 
iteratively analyzing the data, as is the procedure out- 
lined in this section, then it should be used. This will 
help identify out-of-control situations early. 

Of course the ability to detect out-of-control situ- 
ations depends on the calibration design, i.e., x -values 
used for the calibration. Such designs have been dis- 
cussed in detail for linear spline calibration curves [7]. 
As a byproduct of the present investigation we show the 
soundness of the advice in the cited work against using 
the exact optimal design. In fact efficiency under an 
assumed model and an ability to check when this model 
holds are competing demands. 

A. procedure is given for checking in an ad hoc fash- 
ion the validity of the previous assumptions. The checks 
are deliberately for coverage probabilities rather than 
directly for the assumptions, i.e., the stated (1-a) un- 
certainty level is checked. This is an indirect check on 
the underlying assumptions. If an assumption is mar- 



ginally violated and yet the 1-a is met, the author sees 
little reason to doubt the calibration procedure. If the 
nominal level is not met, then the calibrator is expected 
to at least check his measurement procedure and possi- 
bly reset his equipment. The novelty of this procedure is 
that if the experiment is carried to completion, the nom- 
inal levels (1 —a), and (1 — 8) remain valid. 

Our procedure is as follows: 

Step 1. After a reasonable amount of data is collected, 
the data are divided into two groups, SGI and SG2. 
New data are placed in either group, Ways in which this 
may be done are given as comments at the end of this 
section. Each group should contain approximately half 
the data, although under some circumstances other di- 
visions are reasonable (see section 4), The partitioning of 
the available data can be done randomly or according to 
a well chosen statistical sampling plan (see the com- 
ments at the end of this section). 

Step 2. Choose the probability levels 1-a and 1—8. In 
order to simplify the notation, anything calculated only 
from the data in SGI has subscript 1; anything calcu- 
lated from all the data has no subscript. 

Step 3. Using only data from SGI, determine the least 
squares estimate of m{x\ m x {x) and its variance 

Step 4. From the data in SGI, form the Scheffe upper 
and lower curves Ufa) and L(x). The rationale is given 
in Scheffe [3] and Lechner, Reeve, and Spiegelman [6]. 



For all x 

U(x) = rhi(x)-oiz a + Xt{p)S\(x)) 
L(x) = A 1 (x) + Or(2 a + X8(/ ? >lU))- 

Here z a is the two-tailed a point of a standard normal; 
)Q(p) is the upper 8 point of the chi -squared distribution 
with p degrees of freedom and 



Step 5 (optional). Calculate the minimum of sfa) in the 
calibration region. Denote min s\(x) by s. Redefine^ to 
be the solution q of the equation 



*(?+2x 8 J)-*(-?)~l-a. 



(1) 
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As explained [5], this step reduces the conser- 
vativeness of the Scheffe procedure while maintaining 
the validity of the probability statements. 

Step 6. For each (x h Y,) in SG2 check whether or not x { 
ElL-XYhU-KYt)]. Let 



r,= 



Utxt€[L-XY t ) t U-\Y t )] 
otherwise 



Recall both rh\(x) and m(x) are linear combinations 
of the A,(jc),; = 1, ,,.,/>, Let 0'=(0,,.,,,0 P ) be a vector 
parameter in R p and 0'h(jc)=mi(jc)-m(x), Let 
p{x,B) - <W*) + MpMx) + z tt ) - <D(0Xx) - 
Xa (/^Vi(jc) — ^o). Finally denote the likelihood condi- 
tioned on SGI and thus also on ft by 



L(x,0);L(x,0)= it p(xJ) T i\-p(xJ)) ] - T <. 



From the likelihood L(x,0), get the maximum like- 
lihood estimator d for and compute the maximum 
likelihood estimator for/?(x,0), p(x$). Check whether 
or not />(*,&)> 1 -a for all jc in the calibration region. If 
for some x, p{x$)<\-a, consider the measurement 
process possibly defective, lfforallx, />(*,())> l~a and 
the calibration experiment is not finished, collect the 
next data point and return to step 1. 

Many scientists may not have the computer programs 
readily available to form the efficient maximum like- 
lihood estimator of p(x,d), In these cases we recom- 
mend using local averaging or otherwise smoothed esti- 
mates (see Stone [8] or Collomb [9]), In particular we 
recommend a nearest neighbor approach, Choose a 
number k and then at each point x in the calibration 
region average the T t values corresponding to the k 
closest Xi values to jc, For small samples there is little 
known about choosing k; however, in large samples a 
value of k approximately equal to n m should be satis- 
factory. 

This procedure provides a balanced check on 
whether the conservativeness of the Scheffe procedure 
and the lack of the model holding exactly, seriously alter 
the hoped for uncertainty level 1-a. The bigger the 
sample size, the less conservative the Scheffe procedure. 

Comments about design: 

As previously stated, the Scheffe procedure is very 
conservative when ^(x) is large, Therefore, some of the 
best diagnostic information comes from data where 
S](x)=s, The optimal (2? -optimal) design takes obser- 



vations where s(x) is at a maximum. Thus for a straight 
line the optimum design has observations only at the 
ends of the calibration region. Some of the best diagnos- 
tic information occurs at x -values in the middle and will 
be missed with this design. 

Comments about subgroups: 

Often the calibrator will have a good understanding 
about the possible malfunction of his measurement sys- 
tem, Then a choice of subgroups will be clear. He 
should feel free to choose as many combinations as he 
likes, The validity of uncertainty statements for com- 
pleted calibrations remains. The check procedures are 
ad hoc, and if many checks are performed, he should 
expect some of them to indicate a possible malfunction 
of his system. The interpretation of these ad hoc checks 
requires sound scientific and engineering judgment. 
Some possible choices of subgroups are: 

1) If we are mainly interested in detecting drift then 
SGI should contain the older measurements and SG2 
the newer ones. If we want to check run-to-run vari- 
ability, SGI and SG2 should not contain observations 
from the same run, 

2) Suppose we want to check whether or not m(x) 
has the assumed form over a subinterval [a Jb\ Then 
SGI should not contain (if possible) observations with 
x* values in [a,b], 

3. Analysis 

We show that if cr is known all the T t are independent 
of m 2 (x); in this case, our iterative check does not affect 
the coverage probabilities when the model defined in 
section 2 holds, 

Theorem, When cr is known the statistics T; are indepen- 
dent of A(\). 

Proof: 

7>1 if and only if 



m { (x i )-<r{x^p)s [ ^iH^)<Y i 



<&\bi)+cr(xt(p)si(xt)+z*) 



Clearly eq (2) is equivalent to -(Xb(p)si(xi)+z a ) 



(2) 



cr 



(x$(pMxHza\ 
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Given the least squares estimate for /J, # 2 , 

E[Y t \$]=fit(xd> similarly £[/M*/) I £]=*(*/)• 

Thus, r,— /h^jc,-) is uncorrelated with $. Since the 
F; : — fh](x t ) and $ are jointly normal the T ( are indepen- 
dent of #. Q.E.D. 

Suppose cr is not known but estimated independently 
from the calibration experiment. Then if the upper and 
lower bounds in Scheffe [3] are modified as he indicated 
the desired uncertainty statements still apply. This fol- 
lows from the fact that all of the T t arc independent of 
<x 2 . This is not obvious to the author so the details are 
included. 

Let T, be modified to incorporate replacement of a by 
(Tii see Scheffe [3] for details. It is to be shown that the 
T ( are independent of <r 2 . Divide all sides of modified eq 
(2) by o* 2 . After some algebra &\/& 2 can be written as a 
function of the ratio of two independent chi-squares 
whose sum is proportional to cr 2 , By applying standard 
change-of-variable techniques, it can be seen that & 1 is 
independent of this ratio. Finally [Y { — m{x)]/& 
are uniformly distributed on a unit sphere and are 
independent of cr 2 . Q.E.D. 



Table 1. Mass-pressure calibration data. 



Mass 


Pressure 


Run 


567.004 


2.0S534 


1 


567.2 


2.0655 


3 


567.22 


2.05974 


2 


585.772 


2.32647 


4 


586.091 


2.32747 


3 


604.913 


2.58939 


5 


604,964 


2.5881 


3 


623.878 


2.84772 


3 


680.441 


3.62457 


1 


680.693 


3.61958 


2 


699.204 


3.88191 


4 


718.321 


4.14248 


5 


737.333 


4.39982 


3 


793.881 


5.17109 


1 


794.134 


5.16728 


2 


812.658 


5.4279 


4 


831.74 


5.68723 


5 


850.749 


5.94467 


3 


907.347 


6.71461 


1 


907.572 


6.71065 


2 


926.108 


6.97103 


4 



4. Example 



The pressure mass calibration example is based upon 
data collected under the direction of J. Whetstone of 
NBS. The tank is of an experimental nature and is lo- 
cated in the fluid mechanics building at the National 
Bureau of Standards. The calibration curve relates pres- 
sure and mass measurements. In the region where the 
tank is used the calibration curve is hypothesized to be 
a straight line. However, due to bowing of the tank 
walls C. P. Reeve of NBS' Statistical Engineering Di- 
vision and the author felt a cubic model was more ap- 
propriate. This model corresponds to linear deformation 
of all the tank walls. 

The calculations made were done using the updated 
version of the program fully documented in Lechner, 
Reeve, and Spiegelman [10]. The updated program al- 
lows designation of training and test samples and auto- 
matically indicates whether or not a test point is in the 
calibration interval. Further information about this 
modification can be obtained from the author or C. P. 
Reeve, 

The data are shown in table 1. In figure 2 residuals 
from the five runs are shown. Clearly run 2 is quite 
different from the others. However, as figure 3 indi- 
cates, the third run is also quite different from runs 1, 4, 
and 5. 

In all cases a 2 is estimated from the data. For the data 
on hand if SG 1 contains any data points from run 2 then 



p(x,d) is identically one. That is, the Scheffe intervals 
include all the data in SG2. This is true regardless of 
how many points are in SGI, provided it is five or more. 
(Note: Five is the minimum number of observations 
needed). If all of the points from run 2 are in SG2 then 
the Scheffe intervals cover none of them. In particular if 
SG2 contains only the data from run 2,p(x$) is identi- 
cally zero, see figure 4. 

Note that in typical cross validation procedures a 
fixed number of observations, usually one, is dropped 
out at a time and the procedure checked [1 1]. If this is 
done then the estimate ofp(x ,0) is identically one. It can 
be shown that even if four or five observations are 
dropped out at one time the resulting average estimate 
of p(x f 6) will be nearly one. Thus, it appears that in this 
case purposeful choice of SGI and SG2 is important. 



5. Conclusions and Summary 

It is important to find out early whether or not a 
calibration procedure is in control In particular for the 
example in section 3, had the new procedure been ap- 
plied the experiment might have been terminated as a 
failure after run 3. Alternatively one additional run to 
compensate for run 2 may have been collected. Surely 
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PRESSURE-MASS CALIBRATION 
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Figure 2 -Residuals from runs 1-5. 
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Figure 3™ Residuals from runs 1, 3, 
4, and 5. 
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something different would have been done. Clearly, too, Thus an iterative calibration can provide insight into 

the Scheffe procedure is conservative enough to ac- the calibration procedure in a timely fashion without 

count for some unmodeled run-to-run variation as in run doing too much violence to the final uncertainty state- 

3. ments. 
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Figure 4 - Summary of c ross- 
validation results. Data from 
runs 1, 3, 4, and 5 are shown in 
SGI; data from run 2 are shown 
in SG2. A value bigger than 1 
in absolute value indicates an x 
value outside the calibration 
interval. 



Software PackaQe:CPR*SPLINEUPDATE 
Summary of Cross- Validation Results 



Inside x C.l. 
Outside x C.l. 
Pet Inside x C.L 



SG1 

X included 

17 



100% 



SG2 

X excluded 



4 

0% 



The author thanks J. Whetstone for providing the 
data and insight into his calibration system. The data 
were jointly examined with C. P. Reeve who has writ- 
ten a program to implement many of the procedures 
shown in this paper. 
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